Fold-specific substitution matrices for protein classification

نویسندگان

  • R. B. Vilim
  • R. M. Cunningham
  • B. Lu
  • P. Kheradpour
  • Fred J. Stevens
چکیده

MOTIVATION Methods that focus on secondary structures, such as Position Specific Scoring Matrices and Hidden Markov Models, have proved useful for assigning proteins to families. However, for assigning proteins to an attribute class within a family these methods may introduce more free parameters than are needed. There are fewer members and there is less variability among sequences within a family. We describe a method for organizing proteins in a family that exhibits up to an order of magnitude reduction in the number of parameters. The basis is the log odds ratio commonly used to measure similarity. We adapt this to characterize the sequence dissimilarities that give rise to attribute differentiation. This leads to the definition of Class Attribute Substitution Matrices (CLASSUM), a dual of the BLOSUM. RESULTS The method was applied to classify sequences hierarchically in the lambda and kappa subgroups of the immunoglobulin superfamily. Positions conferring class were identified based on the degree of amino acid variability at a position. The CLASSUM computed for these positions classified better than 90% of test data correctly compared with 35-50% for BLOSUM-62. The expected value for a random matrix is 14%. The results suggest that family-specific data-derived substitution matrices can improve the resolution of automated methods that use generic substitution matrices for searching for and classifying proteins.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Amino acid substitution matrices for protein conformation identification

Methods for alignment of protein sequences typically measure similarity by using substitution matrix with scores for all possible exchanges of one amino acid with another. Although widely used, the matrices derived from homologous sequence segments, such as Dayhoff’s PAM matrices and Henikoff’s BLOSUM matrices, are not specific for protein conformation identification. Using a different approach...

متن کامل

Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation.

An analysis was performed on 335 pairs of structurally aligned proteins derived from the structural classification of proteins (SCOP http://scop.mrc-lmb.cam.ac.uk/scop/) database. These similarities were divided into analogues, defined as proteins with similar three-dimensional structures (same SCOP fold classification) but generally with different functions and little evidence of a common ance...

متن کامل

3D representations of amino acids—applications to protein sequence comparison and classification

The amino acid sequence of a protein is the key to understanding its structure and ultimately its function in the cell. This paper addresses the fundamental issue of encoding amino acids in ways that the representation of such a protein sequence facilitates the decoding of its information content. We show that a feature-based representation in a three-dimensional (3D) space derived from amino a...

متن کامل

Position Dependent and Independent Evolutionary Models Based on Empirical Amino Acid Substitution Matrices

Evolutionary models measure the probability of amino acid substitutions occurring over different evolutionary distances. We examine various evolutionary models based on empirically derived amino acid substitution matrices. The models are constructed using the PAM and BLOSUM amino acid substitution matrices. We rescale these matrices by raising them to powers to model substitution patterns that ...

متن کامل

Improving Chernoff criterion for classification by using the filled function

Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 20 6  شماره 

صفحات  -

تاریخ انتشار 2004